Detecting Arabic Cloaking Web Pages Using Hybrid Techniques

نویسندگان

  • Heider A. Wahsheh
  • Mohammed N. Al-Kabi
  • Izzat M. Alsmadi
چکیده

Many challenges are emerging in the every day expanding Internet environment, whether for the Internet users or the Web sites owners. The Internet users need to retrieve the high quality relevant information which are relevant to their queries within a short period of time, in order to be a regular users who satisfied by search engine performance. While the Web site owners aim in most cases to increase the rank of their Web pages within SERP to attract more customers to their Web sites, and consequently gaining more visits, which in turn means more revenues. The top rank of the Web pages within SERPs, is very important to the e-commerce and commercial Web pages. The owners of Web sites can attract more visitors to their Web pages, and gain more revenue, through Pay Per Click when their pages appear in the top results of SERPs. This paper proposed new approach of Arabic Web spam detection, dedicated with the cloaking Web pages, using hybrid techniques of content and link analysis. The proposed detection system built the first Arabic cloaking dataset contains around 5,000 Arabic cloaked Web pages. The proposed system extracts all possible rules from HTML element to monitor the cloaking behaviors, and then used three classification algorithms (K-NN, Decision Tree, and Logistic Recognition) in the experimental tests. This novel system yielded a high accuracy results with an accuracy of 94.1606% in detecting cloaking behaviors in Arabic Web pages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cloaking and Redirection: A Preliminary Study

Cloaking and redirection are two possible search engine spamming techniques. In order to understand cloaking and redirection on the Web, we downloaded two sets of Web pages while mimicking a popular Web crawler and as a common Web browser. We estimate that 3% of the first data set and 9% of the second data set utilize cloaking of some kind. By checking manually a sample of the cloaking pages fr...

متن کامل

Detecting Stealth Web Pages That Use Click-Through Cloaking

Search spam is an attack on search engines’ ranking algorithms to promote spam links into top search ranking that they do not deserve. Cloaking is a wellknown search spam technique in which spammers serve one page to search-engine crawlers to optimize ranking, but serve a different page to browser users to maximize potential profit. In this experience report, we investigate a different and rela...

متن کامل

Detecting Cloaking Web Spam Using Hash Function

Web spam is an attempt to boost the ranking of special pages in search engine results. Cloaking is a kind of spamming technique. Previous cloaking detection methods based on terms/links differences between crawler and browser’s copies are not accurate enough. The latest technique is tag-based method. This method could find cloaked pages better than previous algorithms. However, addressing the c...

متن کامل

Analyzing new features of infected web content in detection of malicious web pages

Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...

متن کامل

Improving Cloaking Detection using Search Query Popularity and Monetizability

Cloaking is a search engine spamming technique used by some Web sites to deliver one page to a search engine for indexing while serving an entirely different page to users browsing the site. In this paper, we show that the degree of cloaking among search results depends on query properties such as popularity and monetizability. We propose estimating query popularity and monetizability by analyz...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013